图像之间的感知距离在预训练的深度特征的空间中测量,在评估图像相似性方面优于先前的低级,基于像素的指标。虽然众所周知,较旧模型(例如Alexnet和VGG)捕获感知相似性的功能却较少,但研究了现代和更准确的模型。在本文中,我们提出了一项大规模的经验研究,以评估成像网分类器在感知相似性方面的表现。首先,我们观察到成像网的精度与现代网络(例如重置,有效网络和视觉变压器)的感知得分之间的反相关性:更好的分类器达到了较差的感知得分。然后,我们在不同的深度,宽度,训练步骤,重量衰减,标签平滑和辍学时检查了成像网的精度/感知分数关系。更高的精度将感知得分提高到一定点,但是我们在中高精度方面发现了精度和感知得分之间的帕累托前沿。我们使用许多合理的假设,例如失真不变性,空间频率灵敏度和替代感知函数,进一步探索这种关系。有趣的是,我们发现仅在Imagenet上接受少于5个时代训练的浅重新收集和重新注册,其新兴的感知得分与直接受到监督的人类感知判断直接训练的先前最佳网络相匹配。
translated by 谷歌翻译
由于其对人类生命,运输,粮食生产和能源管理的高度影响,因此在科学上研究了预测天气的问题。目前的运营预测模型基于物理学,并使用超级计算机来模拟大气预测,提前预测数小时和日期。更好的基于物理的预测需要改进模型本身,这可能是一个实质性的科学挑战,以及潜在的分辨率的改进,可以计算令人望而却步。基于神经网络的新出现的天气模型代表天气预报的范式转变:模型学习来自数据的所需变换,而不是依赖于手工编码的物理,并计算效率。然而,对于神经模型,每个额外的辐射时间都会构成大量挑战,因为它需要捕获更大的空间环境并增加预测的不确定性。在这项工作中,我们提出了一个神经网络,能够提前十二小时的大规模降水预测,并且从相同的大气状态开始,该模型能够比最先进的基于物理的模型更高的技能HRRR和HREF目前在美国大陆运营。可解释性分析加强了模型学会模拟先进物理原则的观察。这些结果代表了建立与神经网络有效预测的新范式的实质性步骤。
translated by 谷歌翻译
This paper introduces WaveNet, a deep neural network for generating raw audio waveforms. The model is fully probabilistic and autoregressive, with the predictive distribution for each audio sample conditioned on all previous ones; nonetheless we show that it can be efficiently trained on data with tens of thousands of samples per second of audio. When applied to text-to-speech, it yields state-ofthe-art performance, with human listeners rating it as significantly more natural sounding than the best parametric and concatenative systems for both English and Mandarin. A single WaveNet can capture the characteristics of many different speakers with equal fidelity, and can switch between them by conditioning on the speaker identity. When trained to model music, we find that it generates novel and often highly realistic musical fragments. We also show that it can be employed as a discriminative model, returning promising results for phoneme recognition.
translated by 谷歌翻译
This work explores conditional image generation with a new image density model based on the PixelCNN architecture. The model can be conditioned on any vector, including descriptive labels or tags, or latent embeddings created by other networks. When conditioned on class labels from the ImageNet database, the model is able to generate diverse, realistic scenes representing distinct animals, objects, landscapes and structures. When conditioned on an embedding produced by a convolutional network given a single image of an unseen face, it generates a variety of new portraits of the same person with different facial expressions, poses and lighting conditions. We also show that conditional PixelCNN can serve as a powerful decoder in an image autoencoder. Additionally, the gated convolutional layers in the proposed model improve the log-likelihood of PixelCNN to match the state-ofthe-art performance of PixelRNN on ImageNet, with greatly reduced computational cost.
translated by 谷歌翻译
Modeling the distribution of natural images is a landmark problem in unsupervised learning. This task requires an image model that is at once expressive, tractable and scalable. We present a deep neural network that sequentially predicts the pixels in an image along the two spatial dimensions. Our method models the discrete probability of the raw pixel values and encodes the complete set of dependencies in the image. Architectural novelties include fast twodimensional recurrent layers and an effective use of residual connections in deep recurrent networks. We achieve log-likelihood scores on natural images that are considerably better than the previous state of the art. Our main results also provide benchmarks on the diverse ImageNet dataset. Samples generated from the model appear crisp, varied and globally coherent.
translated by 谷歌翻译